Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for HPE DL380 servers #19

Merged
merged 60 commits into from
Feb 22, 2024

Conversation

jenniferKaiser21
Copy link
Collaborator

@jenniferKaiser21 jenniferKaiser21 commented Feb 21, 2024

Summary

This PR is for expanding FishyMetrics functionality to support HPE ProLiant DL380 Gen10 servers for Prometheus scrape targets, and resolves the following issue: #17

Notable inclusions are metrics on all drives (NVMe, Logical, and Physical Disk Drives).

Additionally, the template has been updated to include the DL380 module in the dropdown menu.
Screenshot 2024-02-21 at 10 58 32 AM

Testing

Tested locally against an inventory of 8 storage specific servers, with configuration that includes multiple Physical Disk Drives, NVMe, and Logical Disk Drives, metrics which are obtained from their respective and differing endpoints.

go run . -user ‘REDACTED’ -password ‘REDACTED’

{"level":"info","ts":"2024-02-21T10:43:00-05:00","caller":"fishymetrics/main.go:422","msg":"started fishymetrics service","app":"fishymetrics","host":"REDACTED"}
{"level":"info","ts":"2024-02-21T10:43:08-05:00","caller":"fishymetrics/main.go:111","msg":"started scrape","app":"fishymetrics","host":"REDACTED","module":"dl380","target":"REDACTED","trace_id":"756e15f9-9051-4e78-9f28-2ee57b4f4669"}
{"level":"info","ts":"2024-02-21T10:43:36-05:00","caller":"fishymetrics/main.go:454","msg":"finished handling","app":"fishymetrics","host":"REDACTED","module":"dl380","target":"REDACTED","sourceAddr":"[::1]:50111","method":"GET","url":"/scrape?target=REDACTED&module=dl380","proto":"HTTP/1.1","status":200,"elapsed_time_sec":27.629735875,"trace_id":"756e15f9-9051-4e78-9f28-2ee57b4f4669"}

Elapsed time is reflective of having to collect metrics from 20+ unique endpoints (due to an increased amount of storage drives in storage configured servers).

Sample result from scraped DL380 server target:

# HELP dl380_disk_drive_status Current Disk Drive status 1 = OK, 0 = BAD
# TYPE dl380_disk_drive_status gauge
dl380_disk_drive_status{Id="0",location="1I:1:1",name="HpeSmartStorageDiskDrive"} 1
dl380_disk_drive_status{Id="0",location="1I:2:1",name="HpeSmartStorageDiskDrive"} 1
dl380_disk_drive_status{Id="0",location="1I:3:1",name="HpeSmartStorageDiskDrive"} 1
dl380_disk_drive_status{Id="1",location="1I:1:2",name="HpeSmartStorageDiskDrive"} 1
dl380_disk_drive_status{Id="1",location="1I:2:2",name="HpeSmartStorageDiskDrive"} 1
dl380_disk_drive_status{Id="1",location="1I:3:2",name="HpeSmartStorageDiskDrive"} 1
dl380_disk_drive_status{Id="2",location="1I:2:3",name="HpeSmartStorageDiskDrive"} 1
dl380_disk_drive_status{Id="2",location="1I:3:3",name="HpeSmartStorageDiskDrive"} 1
dl380_disk_drive_status{Id="3",location="1I:2:4",name="HpeSmartStorageDiskDrive"} 1
dl380_disk_drive_status{Id="3",location="1I:3:4",name="HpeSmartStorageDiskDrive"} 1
dl380_disk_drive_status{Id="4",location="2I:2:5",name="HpeSmartStorageDiskDrive"} 1
dl380_disk_drive_status{Id="4",location="2I:3:5",name="HpeSmartStorageDiskDrive"} 1
dl380_disk_drive_status{Id="5",location="2I:2:6",name="HpeSmartStorageDiskDrive"} 1
dl380_disk_drive_status{Id="5",location="2I:3:6",name="HpeSmartStorageDiskDrive"} 1
dl380_disk_drive_status{Id="6",location="2I:1:7",name="HpeSmartStorageDiskDrive"} 1
dl380_disk_drive_status{Id="6",location="2I:2:7",name="HpeSmartStorageDiskDrive"} 1
dl380_disk_drive_status{Id="7",location="2I:1:8",name="HpeSmartStorageDiskDrive"} 1
dl380_disk_drive_status{Id="7",location="2I:2:8",name="HpeSmartStorageDiskDrive"} 1
# HELP dl380_logical_drive_raid Current Logical Drive Raid
# TYPE dl380_logical_drive_raid gauge
dl380_logical_drive_raid{logicaldrivename="OS_Disk",name="HpeSmartStorageLogicalDrive",raid="1",volumeuniqueidentifier="600508B1001CFA5F1F36AA9D9EF35426"} 1
# HELP dl380_memory_status Current memory status 1 = OK, 0 = BAD
# TYPE dl380_memory_status gauge
dl380_memory_status{totalSystemMemoryGiB="384"} 1
# HELP dl380_nvme_drive_status_TEST Current NVME status 1 = OK, 0 = BAD
# TYPE dl380_nvme_drive_status_TEST gauge
dl380_nvme_drive_status_TEST{id="DA000000",protocol="NVMe",serviceLabel="Box 3:Bay 7"} 1
dl380_nvme_drive_status_TEST{id="DA000001",protocol="NVMe",serviceLabel="Box 3:Bay 8"} 1
# HELP dl380_power_supply_output Power supply output in watts
# TYPE dl380_power_supply_output gauge
dl380_power_supply_output{memberId="0",sparePartNumber="P39385-001"} 105
dl380_power_supply_output{memberId="1",sparePartNumber="P39385-001"} 146
# HELP dl380_power_supply_status Current power supply status 1 = OK, 0 = BAD
# TYPE dl380_power_supply_status gauge
dl380_power_supply_status{memberId="0",sparePartNumber="P39385-001"} 1
dl380_power_supply_status{memberId="1",sparePartNumber="P39385-001"} 1
# HELP dl380_power_supply_total_capacity Total output capacity of all the power supplies
# TYPE dl380_power_supply_total_capacity gauge
dl380_power_supply_total_capacity{memberId="0"} 1600
# HELP dl380_power_supply_total_consumed Total output of all power supplies in watts
# TYPE dl380_power_supply_total_consumed gauge
dl380_power_supply_total_consumed{memberId="0"} 251
# HELP dl380_thermal_fan_speed Current fan speed in the unit of percentage, possible values are 0 - 100
# TYPE dl380_thermal_fan_speed gauge
dl380_thermal_fan_speed{name="Fan 1"} 12
dl380_thermal_fan_speed{name="Fan 2"} 12
dl380_thermal_fan_speed{name="Fan 3"} 12
dl380_thermal_fan_speed{name="Fan 4"} 16
dl380_thermal_fan_speed{name="Fan 5"} 20
dl380_thermal_fan_speed{name="Fan 6"} 20
# HELP dl380_thermal_fan_status Current fan status 1 = OK, 0 = BAD
# TYPE dl380_thermal_fan_status gauge
dl380_thermal_fan_status{name="Fan 1"} 1
dl380_thermal_fan_status{name="Fan 2"} 1
dl380_thermal_fan_status{name="Fan 3"} 1
dl380_thermal_fan_status{name="Fan 4"} 1
dl380_thermal_fan_status{name="Fan 5"} 1
dl380_thermal_fan_status{name="Fan 6"} 1
# HELP dl380_thermal_sensor_status Current sensor status 1 = OK, 0 = BAD
# TYPE dl380_thermal_sensor_status gauge
dl380_thermal_sensor_status{name="01-Inlet Ambient"} 1
dl380_thermal_sensor_status{name="02-CPU 1"} 1
dl380_thermal_sensor_status{name="03-CPU 2"} 1
dl380_thermal_sensor_status{name="04-P1 DIMM 1-6"} 1
dl380_thermal_sensor_status{name="06-P1 DIMM 7-12"} 1
dl380_thermal_sensor_status{name="08-P2 DIMM 1-6"} 1
dl380_thermal_sensor_status{name="10-P2 DIMM 7-12"} 1
dl380_thermal_sensor_status{name="12-HD Max"} 1
dl380_thermal_sensor_status{name="13-Exp Bay Drive"} 1
dl380_thermal_sensor_status{name="14-Stor Batt 1"} 1
dl380_thermal_sensor_status{name="15-Front Ambient"} 1
dl380_thermal_sensor_status{name="16-VR P1"} 1
dl380_thermal_sensor_status{name="17-VR P2"} 1
dl380_thermal_sensor_status{name="18-VR P1 Mem 1"} 1
dl380_thermal_sensor_status{name="19-VR P1 Mem 2"} 1
dl380_thermal_sensor_status{name="20-VR P2 Mem 1"} 1
dl380_thermal_sensor_status{name="21-VR P2 Mem 2"} 1
dl380_thermal_sensor_status{name="22-Chipset"} 1
dl380_thermal_sensor_status{name="23-BMC"} 1
dl380_thermal_sensor_status{name="24-BMC Zone"} 1
dl380_thermal_sensor_status{name="25.1-HD Controller-Add-in card"} 1
dl380_thermal_sensor_status{name="25.2-HD Controller-I/O controlle"} 1
dl380_thermal_sensor_status{name="25.3-HD Controller-Add-in card"} 1
dl380_thermal_sensor_status{name="26-HD Cntlr Zone"} 1
dl380_thermal_sensor_status{name="28.1-LOM Card-I/O module"} 1
dl380_thermal_sensor_status{name="28.2-LOM Card-I/O module"} 1
dl380_thermal_sensor_status{name="28.3-LOM Card-I/O module"} 1
dl380_thermal_sensor_status{name="29-LOM Card Zone"} 1
dl380_thermal_sensor_status{name="30.1-PCI 1-I/O module"} 1
dl380_thermal_sensor_status{name="30.2-PCI 1-I/O module"} 1
dl380_thermal_sensor_status{name="30.3-PCI 1-I/O module"} 1
dl380_thermal_sensor_status{name="31-PCI 1 Zone"} 1
dl380_thermal_sensor_status{name="32.1-PCI 2-Add-in card"} 1
dl380_thermal_sensor_status{name="32.2-PCI 2-I/O controller"} 1
dl380_thermal_sensor_status{name="32.3-PCI 2-Add-in card"} 1
dl380_thermal_sensor_status{name="32.4-PCI 2-Add-in card"} 1
dl380_thermal_sensor_status{name="33-PCI 2 Zone"} 1
dl380_thermal_sensor_status{name="34.1-PCI 3-Add-in card"} 1
dl380_thermal_sensor_status{name="34.2-PCI 3-I/O controller"} 1
dl380_thermal_sensor_status{name="34.3-PCI 3-Add-in card"} 1
dl380_thermal_sensor_status{name="34.4-PCI 3-Add-in card"} 1
dl380_thermal_sensor_status{name="35-PCI 3 Zone"} 1
dl380_thermal_sensor_status{name="53-Battery Zone"} 1
dl380_thermal_sensor_status{name="54-P/S 1 Inlet"} 1
dl380_thermal_sensor_status{name="55-P/S 2 Inlet"} 1
dl380_thermal_sensor_status{name="56-P/S 1"} 1
dl380_thermal_sensor_status{name="57-P/S 2"} 1
dl380_thermal_sensor_status{name="58-P/S 2 Zone"} 1
dl380_thermal_sensor_status{name="59-E-Fuse"} 1
dl380_thermal_sensor_status{name="96-CPU 1 PkgTmp"} 1
dl380_thermal_sensor_status{name="97-CPU 2 PkgTmp"} 1
# HELP dl380_thermal_sensor_temperature Current sensor temperature reading in Celsius
# TYPE dl380_thermal_sensor_temperature gauge
dl380_thermal_sensor_temperature{name="01-Inlet Ambient"} 22
dl380_thermal_sensor_temperature{name="02-CPU 1"} 40
dl380_thermal_sensor_temperature{name="03-CPU 2"} 40
dl380_thermal_sensor_temperature{name="04-P1 DIMM 1-6"} 42
dl380_thermal_sensor_temperature{name="06-P1 DIMM 7-12"} 40
dl380_thermal_sensor_temperature{name="08-P2 DIMM 1-6"} 46
dl380_thermal_sensor_temperature{name="10-P2 DIMM 7-12"} 42
dl380_thermal_sensor_temperature{name="12-HD Max"} 35
dl380_thermal_sensor_temperature{name="13-Exp Bay Drive"} 35
dl380_thermal_sensor_temperature{name="14-Stor Batt 1"} 32
dl380_thermal_sensor_temperature{name="15-Front Ambient"} 35
dl380_thermal_sensor_temperature{name="16-VR P1"} 43
dl380_thermal_sensor_temperature{name="17-VR P2"} 46
dl380_thermal_sensor_temperature{name="18-VR P1 Mem 1"} 38
dl380_thermal_sensor_temperature{name="19-VR P1 Mem 2"} 38
dl380_thermal_sensor_temperature{name="20-VR P2 Mem 1"} 37
dl380_thermal_sensor_temperature{name="21-VR P2 Mem 2"} 38
dl380_thermal_sensor_temperature{name="22-Chipset"} 55
dl380_thermal_sensor_temperature{name="23-BMC"} 80
dl380_thermal_sensor_temperature{name="24-BMC Zone"} 49
dl380_thermal_sensor_temperature{name="25.1-HD Controller-Add-in card"} 54
dl380_thermal_sensor_temperature{name="25.2-HD Controller-I/O controlle"} 63
dl380_thermal_sensor_temperature{name="25.3-HD Controller-Add-in card"} 55
dl380_thermal_sensor_temperature{name="26-HD Cntlr Zone"} 55
dl380_thermal_sensor_temperature{name="28.1-LOM Card-I/O module"} 79
dl380_thermal_sensor_temperature{name="28.2-LOM Card-I/O module"} 61
dl380_thermal_sensor_temperature{name="28.3-LOM Card-I/O module"} 63
dl380_thermal_sensor_temperature{name="29-LOM Card Zone"} 44
dl380_thermal_sensor_temperature{name="30.1-PCI 1-I/O module"} 68
dl380_thermal_sensor_temperature{name="30.2-PCI 1-I/O module"} 58
dl380_thermal_sensor_temperature{name="30.3-PCI 1-I/O module"} 58
dl380_thermal_sensor_temperature{name="31-PCI 1 Zone"} 48
dl380_thermal_sensor_temperature{name="32.1-PCI 2-Add-in card"} 53
dl380_thermal_sensor_temperature{name="32.2-PCI 2-I/O controller"} 72
dl380_thermal_sensor_temperature{name="32.3-PCI 2-Add-in card"} 64
dl380_thermal_sensor_temperature{name="32.4-PCI 2-Add-in card"} 57
dl380_thermal_sensor_temperature{name="33-PCI 2 Zone"} 50
dl380_thermal_sensor_temperature{name="34.1-PCI 3-Add-in card"} 53
dl380_thermal_sensor_temperature{name="34.2-PCI 3-I/O controller"} 72
dl380_thermal_sensor_temperature{name="34.3-PCI 3-Add-in card"} 62
dl380_thermal_sensor_temperature{name="34.4-PCI 3-Add-in card"} 58
dl380_thermal_sensor_temperature{name="35-PCI 3 Zone"} 50
dl380_thermal_sensor_temperature{name="53-Battery Zone"} 46
dl380_thermal_sensor_temperature{name="54-P/S 1 Inlet"} 35
dl380_thermal_sensor_temperature{name="55-P/S 2 Inlet"} 40
dl380_thermal_sensor_temperature{name="56-P/S 1"} 49
dl380_thermal_sensor_temperature{name="57-P/S 2"} 55
dl380_thermal_sensor_temperature{name="58-P/S 2 Zone"} 47
dl380_thermal_sensor_temperature{name="59-E-Fuse"} 40
dl380_thermal_sensor_temperature{name="96-CPU 1 PkgTmp"} 58
dl380_thermal_sensor_temperature{name="97-CPU 2 PkgTmp"} 65
# HELP up Was the last scrape of chassis monitor successful.
# TYPE up gauge
up 1

jenniferKaiser21 and others added 30 commits February 7, 2024 16:15
@CLAassistant
Copy link

CLAassistant commented Feb 21, 2024

CLA assistant check
All committers have signed the CLA.

hpe/dl380/metrics.go Outdated Show resolved Hide resolved
hpe/dl380/exporter.go Outdated Show resolved Hide resolved
hpe/dl380/exporter.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@derrick-dacosta derrick-dacosta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@derrick-dacosta derrick-dacosta merged commit daa57b5 into Comcast:main Feb 22, 2024
2 of 3 checks passed
@derrick-dacosta derrick-dacosta linked an issue Feb 22, 2024 that may be closed by this pull request
@jenniferKaiser21 jenniferKaiser21 deleted the hpe_dl380 branch June 11, 2024 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add HPE ProLiant DL380 Gen10 Support
3 participants